Skip to content

Fix pdb / breakpoint() hang in workflow code#1568

Open
elidlocke wants to merge 1 commit into
temporalio:mainfrom
elidlocke:pdb-hang-repro
Open

Fix pdb / breakpoint() hang in workflow code#1568
elidlocke wants to merge 1 commit into
temporalio:mainfrom
elidlocke:pdb-hang-repro

Conversation

@elidlocke

@elidlocke elidlocke commented Jun 1, 2026

Copy link
Copy Markdown

What was changed

When debug_mode=True on the Worker (or TEMPORAL_DEBUG=1), breakpoint() inside workflow code now opens an interactive pdb prompt — including from a sandboxed workflow run under pytest. Without debug_mode, the sandbox raises a clear error pointing the user at debug_mode=True. Pieces:

  • Inline dispatch on the main thread (debug mode). In debug_mode, activations run on the asyncio main thread (scheduled via loop.call_soon to avoid nesting inside the dispatch task's __step()), so pdb's input() reaches the TTY.
  • Targeted sandbox relaxation (debug mode). In debug_mode, breakpoint is removed from the sandbox's invalid builtins so the call can reach the worker hook. Nothing else is relaxed.
  • Custom Pdb subclass. Drops into pdb at the workflow's own frame (not our indirection), suspends sandbox checks for the duration of each REPL interaction, and overrides q / Ctrl-D to continue the workflow instead of failing it with BdbQuit.
  • Better sandbox error without debug_mode. The sandbox's breakpoint restriction now carries a leaf_message directing the user at debug_mode=True (rather than the generic "mark the import as pass through" advice, which is the wrong fix here). Uses the same SandboxMatcher.leaf_message mechanism already in place for asyncio.as_completed.

The workflow sys.breakpointhook is only installed when debug_mode=True. With debug_mode off, dispatch, sandbox config, and sys.breakpointhook are all identical to upstream main.

Why?

breakpoint() and pdb.set_trace() inside workflow code can't reach a debugger today. In a sandboxed workflow, the sandbox raises an error — but the error tells the user to "mark the import as pass through," which is the wrong fix. In an unsandboxed workflow, the call falls into pdb on a thread without a working stdin and the worker thread hangs.

Three overlapping issues for the debug-mode path:

  1. Activations run on a ThreadPoolExecutor thread, so pdb's input() can't read the controlling TTY.
  2. The sandbox flags breakpoint as non-deterministic, so the call can't reach the debugger.
  3. pdb's cmdloop touches more sandbox-restricted internals at runtime (e.g. readline.get_completer) — relaxing the builtin alone isn't enough.

Direct synchronous activation from the dispatch coroutine doesn't work on Python 3.14:

RuntimeError: Cannot enter into task <workflow run task>
  while another task <_handle_activation> is being executed.

The dispatch task is mid-__step() when workflow.activate tries to step the workflow's own task; 3.14 refuses. await future after loop.call_soon suspends the dispatch task first.

Complements #1249 (sandbox passthrough for IDE debuggers). Independent change, different debugger.

Checklist

  1. Closes Setting debug_mode in a Worker still doesn't allow the user of breakpoints #1104

  2. How was this tested:

  • tests/worker/test_breakpoint_hang.py — five tests covering thread placement (both modes), breakpoint in a sandboxed workflow lands at the user's frame with locals visible, q/Ctrl-D continues cleanly, and (without debug_mode) the sandbox error message points at debug_mode=True. 5/5 pass on Python 3.13 and 3.14.
  • Manual: drop breakpoint() into any workflow's run() body, run via pytest -s (or a standalone python script), confirm the (Pdb) prompt opens at the user's frame with locals in scope.
  1. Any docs updates needed?
  • Yes. Adds a "Debugging Workflows with breakpoint() / pdb" subsection to the README under Workflow Sandbox, with a runnable example and the workflow-task-timeout caveat.

Comment thread temporalio/worker/_workflow.py Outdated
Comment thread temporalio/worker/_workflow.py Outdated
@elidlocke elidlocke force-pushed the pdb-hang-repro branch 2 times, most recently from bff1622 to 8b93ffb Compare June 8, 2026 16:32
@elidlocke elidlocke requested a review from tconley1428 June 8, 2026 16:42
Comment thread temporalio/worker/_workflow.py Outdated
self._deadlock_timeout_seconds = None if debug_mode else 2
self._deadlock_timeout_seconds = None if self._debug_mode else 2

_install_workflow_breakpoint_hook()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably only happen during debug mode as well. It may not make any actual difference, but it would be good to give that assurance that nothing is changing outside of debug mode.

@elidlocke elidlocke Jun 8, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hook is intentionally always installed. The only case it catches is breakpoint() called from workflow code without debug_mode set, which is #1104's original silent hang. Gating on debug_mode would remove the error in exactly the scenario I think should be converted from silent hang to loud error.

Maybe I add a code comment making the always-on rationale explicit?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything in #1104 which indicates a need to enable breakpoints without debug_mode. Doesn't it explicitly say that it in fact fails quite loudly with sandboxing errors? What exactly is the scenario you are trying to address with this? Unsandboxed attempts to use a debug breakpoint without debug mode?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right.

What I was actually hitting: I dropped a breakpoint() into a sandboxed workflow and ran it under pytest, and it looked like a silent hang to me — the workflow task kept retrying forever and pytest was buffering the warning until I re-ran with -s. At that point the sandbox error did show up, just like #1104 describes. So it's not actually silent — the real problem is that the error tells me to "mark the import as pass through," which isn't really helpful feedback. In any case, the process-wide hook was the wrong tool.

Changed approach in the latest push:

  1. _install_workflow_breakpoint_hook() now only fires inside the if self._debug_mode: block in _workflow.py

  2. The sandbox error itself now points users at debug_mode=True instead of the generic "mark the import as pass through" advice. Uses the existing SandboxMatcher.leaf_message mechanism (same one already used for asyncio.as_completed → "use workflow.as_completed() instead"). breakpoint is restructured from a flat use entry to a child matcher with a custom leaf_message, and _importer.py:restrict_built_in was wired to read child.leaf_message so per-builtin overrides actually flow through.

When debug_mode=True (or TEMPORAL_DEBUG=1), breakpoint() inside workflow
code now opens an interactive pdb prompt -- including from a sandboxed
workflow run under pytest. Four pieces:

- Inline dispatch on the asyncio main thread (via loop.call_soon to
  avoid nesting inside the dispatch task's __step() and tripping
  Python 3.14's task-entry validation).
- breakpoint removed from the sandbox's invalid builtins so the call
  reaches the worker hook. Nothing else is relaxed.
- A Pdb subclass that lands at the workflow's own frame, suspends
  sandbox checks during each REPL interaction, and overrides q/Ctrl-D
  to continue the workflow instead of failing it with BdbQuit.
- A defensive sys.breakpointhook that raises a clear RuntimeError when
  breakpoint() is called from a workflow worker thread without
  debug_mode, replacing the previous silent hang.

When debug_mode is not set, the worker's dispatch and sandbox config
are unchanged.

Adds a README subsection on debugging workflows and five tests at
tests/worker/test_breakpoint_hang.py. Verified on Python 3.13 and 3.14.

Closes temporalio#1104.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setting debug_mode in a Worker still doesn't allow the user of breakpoints

2 participants